The Parsimonious Approach to Constructing Fault-Tolerant Protocols
نویسندگان
چکیده
Fault-tolerant distributed protocols, by definition, are designed with the worst in mind. This focus on tolerating the worst often leads to expensive designs that overlook the practical observation that the occurrence of disruptions and failures is rare relative to the lifetime or mission periods of many systems. The class of optimistic fault-tolerant protocols leverages that observation and strives to achieve efficiency during normal operation of the system. In this paper, we describe a specialization of the optimistic approach to building fault-tolerant protocols that we call the parsimonious approach. In the parsimonious approach, we design the protocol with the explicit aim of achieving frugality or efficiency with respect to a given metric of interestM (such as latency degree, resource usage, message complexity, etc.) while never violating correctness (i.e., safety and liveness). When certain operational assumptions are satisfied, the design uses some lightweight mechanism that can provide desired protocol functionality with optimal M. The optimistic hope is that those assumptions are satisfied more often than not, i.e., the chosen parsimonious mechanism is applicable for most of the system’s lifetime. For this hope to be realistic, however, the operational assumptions of the mechanism must have good coverage. The protocol design uses a more expensive fall-back or recovery mechanism whenever the assumptions are not satisfied; after ensuring correctness properties, the protocol then reverts back to using the parsimonious mechanism. To handle situations that are contradictory to the assumptions implies the detection of their occurrence in the first place, i.e., failure or anomaly detection [1]. Correctness must never be violated despite imperfections in the detection mechanism.
منابع مشابه
A Middleware for Constructing Highly Available, Fault Tolerant, and Attack Tolerant Services
This paper describes the design of a middleware that provides support for constructing highly available, secure, fault-tolerant, and attack-tolerant services. The central component of this middleware is a group communication service that comprises of six network protocols: atomic broadcast, group membership, failure detection, attack detection, group access control, and secure intermember commu...
متن کاملOn Feasibility of Adaptive Level Hardware Evolution for Emergent Fault Tolerant Communication
A permanent physical fault in communication lines usually leads to a failure. The feasibility of evolution of a self organized communication is studied in this paper to defeat this problem. In this case a communication protocol may emerge between blocks and also can adapt itself to environmental changes like physical faults and defects. In spite of faults, blocks may continue to function since ...
متن کاملAn Approach to Constructing Modular Fault-Tolerant Protocols
Modularization is a well-known technique for simplifying complex software. Here, an approach to modularizing fault-tolerant protocols such as reliable multicast and membership is described. The approach is based on implementing a protocol’s individual properties as separate microprotocols, and then combining selected micro-protocols using an event-driven software framework; a system is construc...
متن کاملDesign of an Active Approach for Detection, Estimation and Short-Circuit Stator Fault Tolerant Control in Induction Motors
Three phase induction motors have many applications in industries. Consequently, detecting and estimating the fault and compensate it in a way that the faulty induction motor satisfies the predefined goals are important issues. One of the most common faults in induction motors is the short circuit of the stator winding. In this paper, an active fault-tolerant control system is designed and pres...
متن کاملEfficient, scalable consistency for highly fault-tolerant storage
Fault-tolerant storage systems spread data redundantly across a set of storage-nodes in an effort to preserve and provide access to data despite failures. One difficulty created by this architecture is the need for a consistent view, across storage-nodes, of the most recent update. Such consistency is made difficult by concurrent updates, partial updates made by clients that fail, and failures ...
متن کامل